A New Method for Segmenting Newspaper Articles

نویسندگان

  • Basilios Gatos
  • N. Gouraros
  • S. L. Mantzaris
  • Stavros J. Perantonis
  • A. Tsigris
  • P. Tzavelis
  • Nikolaos Vassilas
چکیده

Digital preservation of old newspapers contributes greatly to the historical register of a country's social, political and economical events. At the same time, newspaper preservation is an imperative necessity because of the fast paper deterioration and difficulty in tracing the overwhelming amount of information. Lambrakis Press S.A. owns a large collection of newspapers and periodicals that consists of 1,300,000 pages and covers a time period from 1890 up to date. This material is divided into 600,000 A2 pages, 500,000 A3 tabloid and 200,000 A4 pages approximately. Our team is working on all aspects of the transformation procedure from the printed material to an accessible digital archive (verification and quality control, digitization, cataloguing, search and retrieval, design and content presentation). The final digital documents form the foundation of our digital library. Preservation and processing of this precious material can be achieved by focusing on a series of problems related to the digitization of the printed material, such as: image enhancement by noise removal, isolation of newspaper articles by document understanding techniques (segmentation-labeling). The successful tackling of these problems allows the subsequent efficient cataloguing by employing OCR, full text retrieval and information extraction techniques along with manual indexing. In our paper we will present the results of our research associated with the stage of segmentation of the various regions-the image consists of-as well as the identification of text regions which have to be separated from other regions, i.e. figures, drawings or line regions. The main region segmentation techniques are based on two fundamental approaches: firstly, on the smearing and labeling of regions [1-2], and secondly on the image profiling in various directions [3-4]. Both techniques have not been successful in achieving newspaper segmentation because of the haphazard lay out of newspaper articles and their very close contact. Furthermore, the first approach results in great computational cost. Aiming at a solution of these particular problems accruing from the newspaper segmentation, we suggest a new technique based on

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective

  This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...

متن کامل

A Comparative Analysis of Lexical Bundles in Journalistic Writing in English and Persian: A Contrastive Linguistic Perspective

  This paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. The aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in English and published in England and the other one comprised of newspaper articles written in Persian from Iranian publications. For this purpose, the frequency...

متن کامل

Metadiscourse Markers: A Contrastive Study of Translated and Non-Translated Persuasive Texts

Metadiscourse features are those facets of a text, which make the organization of the text explicit, provide information about the writer's attitude toward the text content, and engage the reader in the interaction. This study interpreted metadiscourse markers in translated and non-translated persuasive texts. To this end, the researcher chose the translated versions of one of the leading newsp...

متن کامل

Discoursal Analysis of Rhetorical Structure of an Online Iraqi English Newspaper

Abstract Rhetorical structure is helpful in improving how the writers maintain cohesion in their writings. This study examines how the Iraqi writers maintain cohesion in the text by analyzing the various rhetorical moves in Azzaman, an online Iraqi newspaper. To this purpose, twelve opinion articles from Azzaman Iraqi newspaper, published from January 2013 to June 2013 were analyzed. The findin...

متن کامل

Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we will compare several state-of-the-art approaches ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998